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This  project  on  subpixel  target  detection  relates  to  research  in  the  optimization  of 
three-dimensional  computing  structures  for  use  in  target  detection  and  to  research 
in  the  reduction  of  an  optimum  computing  structure  to  an  efficiently-designed 
silicon  chip. 


Technical  Progress: 

During  November  work  on  this  project  concentrated  on  establishing  our  working 
relationship  with  Visual  Information  Technology  (Plano,  Texas).  A  visit  by  the 
Principal  Investigator  was  made  to  Visual  Information  Technology  in  mid-November. 
Prior  to  this  visit  an  analysis  was  undertaken  as  to  the  feasibility  of  reducing  the 
patented  Logical  Transform  Image  Processor  directly  to  silicon  (see  U.S.  Patent 
4,641,351,  3  Feb  1987).  This  computing  structure  is  shown  in  the  first  of  the 
attached  drawings.  The  design  illustrated  assumes  a  16-bit  data  bus  which  delivers 
the  values  of  16  voxels  to  7  replicate  memories  each  holding  r28K  bits.  These  are 
indicated  as  memories  Ml  through  M7. 


Seven  replicate  memories  are  required  in  that  the  three-dimensional  neighborhood  is 


the  tetradecahedron  which  includes  13  voxels.  Two  of  these  voxels  exist  on 
isolated  lines.  Four  exist  as  pairs  and  three  exist  as  a  triple.  This  latter 
configuration  includes  the  central  voxel.  This  makes  seven  clusters  of  voxels, 
hence  the  seven  replicate  memories.  As  the  replicate  memories  are  loaded,  the 
seven  address  counters  associated  with  the  replicate  memories  are  pre-set  to  zero 
and  then  incremented  as  the  16-bit  words  arrive  and  are  stored.  During  this  time 
there  are  no  outputs  to  the  word  shifters.  After  the  seven  replicate  memories  are 
f"U  with  data  from  the  three-dimensional  workspace,  the  address  multiplexor  is 
used  to  enter  offset  addresses,  as  appropriate,  so  that  words  read  from  the  seven 
replicate  memories  to  the  word  shifters  during  this  epoch  contain  the  binary  values 
of  16  central  voxels  and  all  of  their  associated  neighbors.  This  data  is  fed  over 
118  wires  via  a  wire  matrix  to  address  the  16  LUTs  (Lookup  Tables).  Previously, 
of  course,  the  LUTs  are  loaded  identically  with  the  same  algorithmic  structure. 
Structures  which  can  be  loaded  provide  for  various  types  of  erosion,  dilation,  or 
skeletonization.  In  target  detection  the  p»*imary  algorithms  are  those  of  erosion 
and  skeletonization. 

The  output  address  multiplexor  is  used  for  two  purposes.  Initially,  it  is  used  to 
load  the  LUTs  with  the  algorithmic  structure  and,  subsequently,  during  the 
processing  epoch,  it  permits  the  sixteen  13-bit  tetradecahedron  values  to  feed  their 
appropriate  LUTs.  The  result  is  the  return  to  the  host  of  16  new  values  for  the 
16  tetradecahedrons  being  processed.  This  operation  continues  until  ail  voxels  in 
the  workspace  have  been  operated  upon. 

Assuming  four  devices  per  cell,  and  totaling  the  number  of  devices  in  the  primary 
device  consumers,  namely  the  replicate  workspace  memories  and  the  LUTs,  an 
estimate  of  four  million  devices  are  needed  to  put  the  entire  computing  structure 
on  a  single  chip.  This,  of  course,  could  be  reduced  by  half  (two  million  devices)  by 
reverting  to  a  byte-processed  workspace.  Even  at  that  level  the  design  is 
infeasible  for  a  single  chip. 

Using  the  results  given  in  Technical  Report  Number  1,  and  further  information 
furnished  by  Visual  Information  Technologies,  the  Principal  Investigator  elected  a 
modified  byte-addressed  processing  system  as  shown  in  the  second  figure  attached. 
This  configuration  uses  multiplexing  and  shifting  at  high-speed  (as  is  feasible  on  a 
single  chip)  to  full  advantage  to  produce  the  same  results  as  a  fully-configured  (two 
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million  device)  processor  with  only  twenty  thousand  devices,  approximately. 

The  operation  of  the  new  configuration  is  as  follows.  Using  a  byte  (8-bit)  input 
via  an  input  data  multiplexor,  7  bytes  are  stored  in  7  one-byte  registers  or  "bit 
shifters."  These  bytes  are  stored  sequentially  at  high  speed  from  the  7  appropriate 
lines  of  the  workspace.  Next,  13  hard-wired  outputs  from  leading  bits  in  these 
bit  shifters  are  used  via  the  address  multiplexor  to  address  a  single  LUT  eight 
sequential  times.  The  LUT  is  reduced  from  a  13-address  LUT  to  a  12-LUT  by 
extracting  the  wire  from  the  central  voxel  to  feed  the  central  voxel  logic.  This 
logic  is  also  fed  by  the  output  from  the  LUT.  Thus  the  twelve  vertices  of  the 
tetradecahedron  are  processed  by  the  LUT  yielding  a  one-bit  output.  This  output  is 
then  combined  with  the  value  of  the  central  voxel  in  the  central  voxel  logic.  The 
output  of  this  logic  is  fed  to  a  8-bit  shift  register  which  accumulates,  after  eight 
shifts,  the  one-byte  output  required  giving  the  new  values  of  the  eight  central 
voxels  being  processed. 

Although  this  new  configuration  is  sequential/parallel  rather  than  being  fully 
parallel,  it  has  overwhelming  advantages  in  terms  of  our  ability  to  reduce  it  to 
silicon.  Also,  due  to  the  fact  that  all  computations  are  on-chip,  an  extremely  high 
clock  rate  can  be  employed.  This  will  allow  us  to  recoup  any  disadvantage  lost  by 
shifting  from  a  fully  parallel  to  a  sequential/parallel  mode  of  operation.  Also  since 
only  twenty  thousand  devices  are  required,  several  of  these  byte  processors  can  be 
configured  per  chip.  At  present  we  are  looking  at  the  possibility  of  having  four 
complete  configurations  per  chip  so  that  32  total  voxels  will  be  processed  in  one 
processing  epoch.  A  further  advantage  to  placing  several  complete  byte  processes 
per  chip  is  that  chip  yield  (not  processor  yield)  will  reach  essentially  100%  due  to 
the  fact  that  the  chip  would  be  usable  if  only  a  single  processing  structure  is  in 
working  condition.  Since  four  complete  processing  structures  will  be  fabricated  per 
chip,  yield  is  likely  to  be  very  high. 

The  above  statement  has  still  more  important  applications  to  the  military  where 
redundancy  is  important.  With  four  complete  processing  structures  per  chip  of 
which  any  one  is  totally  employable  as  a  three-dimensional  computing  structure, 
loss  of  active  componentry  during  a  military  mission  could  be  sensed  and  remedial 
action  taken.  Failing  processors  could  be  deleted  from  the  computing  structure  by 
simply  changing  the  routing  of  bytes  between  the  workspace  memories  and  the 
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processor  memories.  This  redundancy,  therefore,  makes  for  an  attractive  fail-safe 
computing  structure  under  severe  conditions  of  either  heat  and/or  radiation  damage. 

Plans  for  December  1988; 

Plans  for  December  1988  (to  be  documented  during  the  next  two  technical  progress 
reports)  are  twofold.  Work  will  continue  with  Visual  Information  Technology  on 
chip  design.  At  the  same  time  analytical  work  will  continue,  still  based  on 
two-dimensional  processors,  in  order  to  determine  the  optimum  computing  structure 
in  terms  of  computing  rate  (in  pixels  per  unit  time  per  device). 
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